65 research outputs found

    Mutual support of ligand- and structure-based approaches : to what extent we can optimize the power of predictive model? : case study of opioid receptors

    Get PDF
    The process of modern drug design would not exist in the current form without computational methods. They are part of every stage of the drug design pipeline, supporting the search and optimization of new bioactive substances. Nevertheless, despite the great help that is offered by in silico strategies, the power of computational methods strongly depends on the input data supplied at the stage of the predictive model construction. The studies on the efficiency of the computational protocols most often focus on global efficiency. They use general parameters that refer to the whole dataset, such as accuracy, precision, mean squared error, etc. In the study, we examined machine learning predictions obtained for opioid receptors (mu, kappa, delta) and focused on cases for which the predictions were the most accurate and the least accurate. Moreover, by using docking, we tried to explain prediction errors. We attempted to develop a rule of thumb, which can help in the prediction of compound activity towards opioid receptors via docking, especially those that have been incorrectly predicted by machine learning. We found out that although the combination of ligandand structure-based path can be beneficial for the prediction accuracy, there still remain cases that cannot be reliably predicted by any available modeling method. In addition to challenging ligandand structure-based predictions, we also examined the role of the application of machine-learning methods in comparison to simple statistical methods for both standard ligand-based representations (molecular fingerprints) and interaction fingerprints. All approaches were confronted in both classification (where compounds were assigned to the group of active and inactive group constructed on the basis of Ki values) and regression (where exact Ki value was predicted) experiments

    Low cost prediction of probability distributions of molecular properties for early virtual screening

    Full text link
    While there is a general focus on predictions of values, mathematically more appropriate is prediction of probability distributions: with additional possibilities like prediction of uncertainty, higher moments and quantiles. For the purpose of the computer-aided drug design field, this article applies Hierarchical Correlation Reconstruction approach, previously applied in the analysis of demographic, financial and astronomical data. Instead of a single linear regression to predict values, it uses multiple linear regressions to independently predict multiple moments, finally combining them into predicted probability distribution, here of several ADMET properties based on substructural fingerprint developed by Klekota\&Roth. Discussed application example is inexpensive selection of a percentage of molecules with properties nearly certain to be in a predicted or chosen range during virtual screening. Such an approach can facilitate the interpretation of the results as the predictions characterized by high rate of uncertainty are automatically detected. In addition, for each of the investigated predictive problems, we detected crucial structural features, which should be carefully considered when optimizing compounds towards particular property. The whole methodology developed in the study constitutes therefore a great support for medicinal chemists, as it enable fast rejection of compounds with the lowest potential of desired physicochemical/ADMET characteristic and guides the compound optimization process.Comment: 5 pages, 6 figure

    Robust optimization of SVM hyperparameters in the classification of bioactive compounds

    Get PDF
    Background: Support Vector Machine has become one of the most popular machine learning tools used in vir - tual screening campaigns aimed at finding new drug candidates. Although it can be extremely effective in finding new potentially active compounds, its application requires the optimization of the hyperparameters with which the assessment is being run, particularly the C and γ values. The optimization requirement in turn, establishes the need to develop fast and effective approaches to the optimization procedure, providing the best predictive power of the constructed model. Results: In this study, we investigated the Bayesian and random search optimization of Support Vector Machine hyperparameters for classifying bioactive compounds. The effectiveness of these strategies was compared with the most popular optimization procedures—grid search and heuristic choice. We demonstrated that Bayesian optimiza- tion not only provides better, more efficient classification but is also much faster—the number of iterations it required for reaching optimal predictive performance was the lowest out of the all tested optimization methods. Moreover, for the Bayesian approach, the choice of parameters in subsequent iterations is directed and justified; therefore, the results obtained by using it are constantly improved and the range of hyperparameters tested provides the best over - all performance of Support Vector Machine. Additionally, we showed that a random search optimization of hyperpa- rameters leads to significantly better performance than grid search and heuristic-based approaches. Conclusions: The Bayesian approach to the optimization of Support Vector Machine parameters was demonstrated to outperform other optimization methods for tasks concerned with the bioactivity assessment of chemical com- pounds. This strategy not only provides a higher accuracy of classification, but is also much faster and more directed than other approaches for optimization. It appears that, despite its simplicity, random search optimization strategy should be used as a second choice if Bayesian approach application is not feasible

    Generation of new inhibitors of selected cytochrome P450 subtypes- "In silico" study

    Get PDF
    Physicochemical and pharmacokinetic compound profile has crucial impact on compound potency to become a future drug. Ligands with desired activity profile cannot be used for treatment if they are characterized by unfavourable physicochemical or ADMET properties. In the study, we consider metabolic stability and focus on selected subtypes of cytochrome P450 - proteins, which take part in the first phase of compound transformations in the organism. We develop a protocol for generation of new potential inhibitors of selected cytochrome isoforms. Its subsequent stages are composed of generation and assessment of new derivatives of known cytochrome inhibitors, docking and evaluation of the compound possible inhibition on the basis of the obtained ligand-protein complexes. Besides the library of new potential agents inhibiting particular cytochrome subtypes, we also prepare a graph neural network that predicts the change in activity for all modifications of the starting molecule. In addition, we perform a systematic statistical study on the influence of particular substitutions on the potential inhibition properties of generated compounds (both mono- and di-substitutions are considered), provide explanations of the inhibitory predictions and prepare an on-line visualization platform enabling manual inspection of the results. The developed methodology can greatly support the design of new cytochrome P450 inhibitors with the overarching goal of generation of new metabolically stable compounds. It enables instant evaluation of possible compound-cytochrome interactions and selection of ligands with the highest potential of possessing desired biological activity

    The influence of the inactives subset generation on the performance of machine learning methods

    Get PDF
    Background: A growing popularity of machine learning methods application in virtual screening, in both classification and regression tasks, can be observed in the past few years. However, their effectiveness is strongly dependent on many different factors. Results: In this study, the influence of the way of forming the set of inactives on the classification process was examined: random and diverse selection from the ZINC database, MDDR database and libraries generated according to the DUD methodology. All learning methods were tested in two modes: using one test set, the same for each method of inactive molecules generation and using test sets with inactives prepared in an analogous way as for training. The experiments were carried out for 5 different protein targets, 3 fingerprints for molecules representation and 7 classification algorithms with varying parameters. It appeared that the process of inactive set formation had a substantial impact on the machine learning methods performance. Conclusions: The level of chemical space limitation determined the ability of tested classifiers to select potentially active molecules in virtual screening tasks, as for example DUDs (widely applied in docking experiments) did not provide proper selection of active molecules from databases with diverse structures. The study clearly showed that inactive compounds forming training set should be representative to the highest possible extent for libraries that undergo screening

    Active learning of compounds activity : towards scientifically sound simulation of drug candidates identification

    Get PDF
    Abstract. Virtual screening is one of the vital elements of modern drug design process. It is aimed at identification of potential drug candidates out of large datasets of chemical compounds. Many machine learning (ML) methods have been proposed to improve the efficiency and accuracy of this procedure with Support Vector Machines belonging to the group of the most popular ones. Most commonly, performance in this task is evaluated in an offline manner, where model is tested after training on randomly chosen subset of data. This is in stark contrast to the practice of drug candidate selection, where researcher iteratively chooses batches of next compounds to test. This paper proposes to frame this problem as an active learning process, where we search for new drug candidates through exploration of the compounds space simultaneously with the exploitation of current knowledge. We introduce the proof of concept of the simulation and evaluation of such pipeline, together with novel solutions based on mixing clustering and greedy k-batch active learning strategy

    Generative models should at least be able to design molecules that dock well : a new benchmark

    Get PDF
    Designing compounds with desired properties is a key element of the drug discovery process. However, measuring progress in the field has been challenging due to the lack of realistic retrospective benchmarks, and the large cost of prospective validation. To close this gap, we propose a benchmark based on docking, a widely used computational method for assessing molecule binding to a protein. Concretely, the goal is to generate drug-like molecules that are scored highly by SMINA, a popular docking software. We observe that various graph-based generative models fail to propose molecules with a high docking score when trained using a realistically sized training set. This suggests a limitation of the current incarnation of models for de novo drug design. Finally, we also include simpler tasks in the benchmark based on a simpler scoring function. We release the benchmark as an easy to use package available at https://github.com/cieplinski-tobiasz/smina-docking-benchmark. We hope that our benchmark will serve as a stepping stone toward the goal of automatically generating promising drug candidates

    Multiple conformational states in retrospective virtual screening : homology models vs. crystal structures : beta-2 adrenergic receptor case study

    Get PDF
    Background: Distinguishing active from inactive compounds is one of the crucial problems of molecular docking, especially in the context of virtual screening experiments. The randomization of poses and the natural flexibility of the protein make this discrimination even harder. Some of the recent approaches to post-docking analysis use an ensemble of receptor models to mimic this naturally occurring conformational diversity. However, the optimal number of receptor conformations is yet to be determined. In this study, we compare the results of a retrospective screening of beta-2 adrenergic receptor ligands performed on both the ensemble of receptor conformations extracted from ten available crystal structures and an equal number of homology models. Additional analysis was also performed for homology models with up to 20 receptor conformations considered. Results: The docking results were encoded into the Structural Interaction Fingerprints and were automatically analyzed by support vector machine. The use of homology models in such virtual screening application was proved to be superior in comparison to crystal structures. Additionally, increasing the number of receptor conformational states led to enhanced effectiveness of active vs. inactive compounds discrimination. Conclusions: For virtual screening purposes, the use of homology models was found to be most beneficial, even in the presence of crystallographic data regarding the conformational space of the receptor. The results also showed that increasing the number of receptors considered improves the effectiveness of identifying active compounds by machine learning method